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ABSTRACT 

We present the Transiting Exoearth Robust Reduction Algorithm (TERRA) — a 
novel framework for identifying and removing instrumental noise in Kepler photometry. 
We identify instrumental noise modes by finding common trends in a large ensemble 
of light curves drawn from the entire Kepler field of view. Strategically, these noise 
modes can be optimized to reveal transits having a specified range of timescales. For 
Kepler target stars of low photometric noise, TERRA produces ensemble-calibrated 
photometry having 33 ppm RMS scatter in 12-hour bins, rendering individual transits 
of earth-size planets around sun-like stars detectable as ~ 3a signals. 

Subject headings: techniques: photometric — planetary systems 



1. Introduction 



The Kepler Mission is ushering in a new era of exoplanet science. Landmark discoveries include 



Kepler-lOb, a rocky planet (Batalha et al. 2011); the Kepler-11 system of six transiting planets 



(Lissauer et al. 2011); earth-sized Kepler-20e and 20f (Fressin et al. 2012); KOI-961b, c, and d 



all smaller than earth (Muirhead et al. 2012); and Kepler-16b a circumbinary planet (Doyle et al. 



2011). While Kepler has revealed exciting individual systems, the mission's legacy will be the first 



statistical sample of planets extending down to earth size and out to 1 AU. Kepler is the first 
instrument capable of answering "How common are earths?" — A question that dates to antiquity. 

Planet candidates are detected by a sophisticated pipeline developed by the Kepler team 
Science Operations Center. In brief, systematic effects in the photometry are suppressed by the 
Pre-search Data Conditioning (PDC) module, the output of which is fed into the Transiting Planet 



Search (TPS) module. For further information, see Jenkins et al. (2010). 



The Kepler mission was designed to study astrophysical phenomena with a wide range of 
timescales, which include 1-hour transits of hot Jupiters, 10-hour transits of planets at 1 AU, and 
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weeklong spot modulation patterns. The PDC module is charged with removing instrumental noise 
while preserving signals with a vast range of timescales. We review sources of instrumental errors 
in § [2j highlighting the effects that are most relevant to transit detection. 

The Kepler team has released candidate planets based on the first 4 and 16 months of data 
( Borucki et al.|20lT Batalha et al.|201~2 ) . Many of the candidates have additional followup observa- 
tions from the ground and space aimed at ruling out false positive scenarios. In addition, statistical 
arguments suggest that 90-95% of all candidates and that ~ 98% of candidates in multi-candidate 



systems are bonafide planets (Morton & Johnson 2011; Lissauer et al. 2012) 



While Kepler's false positive rate is low, its completeness is largely uncharacterized. If the 
completeness decreases substantially with smaller planet size or longer orbital periods, the interpre- 



tations regarding occurrence drawn from the Borucki et al. ( 2011 ) and Batalha et al. ( 2012 ) catalogs 



will be incorrect. Hunting for the smallest planets, including earth-sized planets in the habitable 
zone, will require exquisite suppression of systematic effects. Without optimal detrending, system- 
atic noise will prevent the detection of the smallest planets, possibly the habitable-zone earth-sized 
planets, which is the main goal of the Kepler mission. Therefore, it is essential for independent 
groups to develop pipelines that compliment both PDC and TPS. An early example of an outside 



group successfully identifying new planet candidates is the Planet Hunters project (Fischer et al 



20iTj |Lintott et al.||2012[ ), which uses citizen scientists to visually inspect light curves. In addition, 



existing pipelines from the HAT ground-based search (Huang et al. 2012) and the CoRoT space 
mission (Ofir & Dreizler 2012) have been brought to bear on the Kepler dataset yielding ~ 100 



new planet candidates. 

We present the Transiting Exoearth Robust Reduction Algorithm (TERRA) — a framework 
for identifying and removing systematic noise. We identify systematic noise terms by searching 
for photometric trends common to a large ensemble of stars. Our implementation is tuned toward 
finding trends with transit-length timescales. 



2. Instrumental Noise in Kepler Photometry 



The Kepler spacecraft makes photometric observations of ~156,000 targets. Long cadence 
photometry is computed by summing all the photoelectrons within a predefined target aperture 
during a 29.4 minute integration. The Kepler team makes this "Simple Aperture Photometry" 



available to the scientific community (Fraquelli Sz Thompson 2012). Simple aperture photometry 



contains many sources of noise other than Poisson shot noise. We illustrate several noise sources 
in Figure [TJ where we show the normalized photometry (SF) of KIC-8144222 [Kp = 12.4). 5F = 
(F — F) / F where F is the simple aperture photometry. 

The dominant systematic effect on multi-quarter timescales is "differential velocity aberration" 



(Van Cleve <fc Caldwell 2009). As Kepler orbits the sun, its velocity relative to the Kepler field 



changes. When the spacecraft approaches the Kepler field, stars on the extremities of the field 
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Fig. 1 — Top: Normalized flux from KIC-8144222 (Kp =12.4, CDPP12=35.4 ppm) from Quarter 
1 through 8 (Q1-Q8). Bottom: Detail of Q6 photometry showing KIC-8144222 along with three 
stars of similar brightness, noise level, and location on the FOV (12.0 < Kp < 13.0, CDPP12 < 40 
ppm, mod. out = 16.1). Much of the variability is common to the 4 stars and therefore instrumental 
in origin. The two spikes are due to thermal settling events, and the three-day ripples are due to 
onboard momentum management. 
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move toward the center. Stellar PSFs move over Kepler apertures by ~ 1 arcsecond resulting in a 
~ 1 % effect over 1-year timescales. 

We show a detailed view of KIC-8144222 photometry from Quarter 6 (Q6) in Figure [TJ The 
decaying exponential shapes are caused by thermal settling after data downlinks. Each month, 
Kepler rotates to orient its antenna toward earth. Since Kepler is not a uniformly colored sphere, 
changing the spacecraft orientation with respect to the sun changes its overall temperature. After 
data downlink, Kepler takes several days to return to its equilibrium temperature (Jeffrey Smith, 
private communication, 2012). KIC-8144222 photometry also shows a ~0.1% effect with a 3-day 
period due to thermal coupling of telescope optics to the reaction wheels. We explore this 3-day 
cycle in depth in § |3.3| 



Since all of the previously mentioned noise sources are coherent on timescales longer than one 
cadence (29.4 minutes), the RMS of binned photometry does not decrease as 1/y/N, where N is 
the number of measurements per bin. In order to describe the noise on different timescales, the 
Kepler team computes quantities called CDPP3, CDPP6, and CDPP12 which are measures of 
the photometric scatter in 3, 6, and 12-hour bins. KIC-8144222 has CDPP12 35.4 ppm and is a 
low-noise star (bottom 10 percentile). For a more complete description of noise in Kepler data see 



Christiansen et al.l (2011 ) 



As a comparison, we selected stars which were similar to KIC-8144222 in position on the Field 
of View (FOV), noise level, and brightness (mod.out = 16.1, CDPP12 < 40 ppm, 12.0 < Kp < 13.0). 
From this 13-star sample, we randomly selected 3 stars and show their light curves in Figure [T] The 
photometry from the comparison stars is strikingly similar to the KIC-8144222 photometry. Since 
much of the variability is correlated, it must be due to the state of the Kepler spacecraft. Common 
trends among stars can be identified and removed. The Kepler team calls this "cotrending," a term 
we adopt. 

Correlated noise with timescales between 1 and 10 hours can mimic planetary transits and 
requires careful treatment. To illustrate the transit-scale correlations among a large sample of 
stars, we show a correlation matrix constructed from 200 Q6 light curves in Figure [2] The Kepler 
photometer is an array of 42 CCDs arranged in 21 modules ( Fraquelli &: Thompson||2012 ). We 



organized the rows and columns of the correlation matrix by module. We constructed the correlation 
matrix using the following steps: 

1. We randomly selected 10 light curves from each of the 20 total module^] from stars with the 
following properties: 12.5 < Kp < 13.5 and CDPP12 < 40 ppm. 

2. To highlight transit-scale correlations, we subtracted a best fit spline from the photometry. 
The knots of the spline are fixed at 10-day intervals so that we remove trends > 10 days. 



1 Module 3 failed during Q4 (Christiansen et al 



20111 
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3. We normalized each light curve so that its median absolution deviation (MAD) is unity. 

4. We evaluated the pairwise correlation (Pearson-R) between all 200 stars. 

The correlation matrix shows that stars in some modules (e.g. module 2) correlate strongly with 
other stars in the same module. However, other modules (e.g. module 12) shows little inter-module 
correlation. Finally, the large off-diagonal correlations show that stars in some modules correlate 
strongly with stars in different modules. 



3. Identification of Photometric Modes 

We have shown that there is significant high-frequency (< 10 days) systematic noise in Kepler 
photometry. In order to recover the smallest planets, this noise must be carefully characterized and 
removed. We isolate systematic noise by finding common trends in a large ensemble of stars. This is 
an extension of differential photometry, widely used by ground-based transit surveys to calibrate out 
the time- variable effects of the earth's atmosphere. We find these trends using Principle Component 



Analysis (PC A). This is similar to the Sys-Rem, TFA, and PDC algorithms (Tamuz et al. 2005 



Kovacs et al. 2005 Twicken et al. 2010), but our implementation is different. We briefly review 



PCA in the context of cotrending a large ensemble of light curves. 



3.1. PCA on Ensemble Photometry 

Consider an ensemble of N light curves each with M photometric measurements. We can think 
of the ensemble as a collection of N vectors in an M-dimensional space. Each light curve 5F can 
be written as a linear combination of M basis vectors that span the space, 



5Fi = ax^Vx + . . . + ai. M V M 



SF N = QN t iVi + . . . + a N , M V M 



(1) 



where each of the Vj basis vectors is the 


Equation [T] can be written more compactly 


where 








D = 


; U = 




\SF N J \ 



D = AV 

a\,i ... 
\a Ny i . . . 




a-N,M 
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Fig. 2. — Top: Correlation matrix constructed from 200 Q6 light curves. The correlation (R-value) 
between two stars is represented by the gray scale, which ranges from 0.1 to 0.9. The diagonal 
elements have R = 1. The stars are ordered according to module and the red lines delineate 
one module from another. We enlarge several 10x10 regions in the lower panels. Stars in some 
modules (such as module 2) are highly correlated, while other modules (such as module 12) show 
little correlation. The module 22 - module 16 correlation matrix is an example of significant intra- 
module correlation. 
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Singular Value Decomposition (SVD) simultaneously solves for the basis vectors V and the coeffi- 
cient matrix A because it decomposes any matrix D into 

D = USV T . 

V is an M x M matrix where the columns are the eigenvectors of D T D or "principle components," 
and the diagonal elements of S are the corresponding eigenvalues. The eigenvalues . . . , sm,m} 
describe the extent to which each of the principle components capture variability in the ensemble 
and are ordered from high to low. The columns of U are the eigenvectors of DD T . Both U and V 
are unitary matrices, i.e. UU T = I and VV T = I. 

As we saw in § [2j stars show common photometric trends due to changes in the state of the 
Kepler spacecraft. The most significant principle components will correspond to these common 
trends. If we identify the first NMode principle components as instrumental noise modes, we can 
remove them via 

NMode 

5F itCa i = 5Fi- a i,3 V i ( 2 ) 
j=i 

where 5F ca i is an ensemble-calibrated light curve. However since the collection of {Vi, . . . Vm} 
spans the space, the higher principle components describe astrophysical variability, shot noise, and 
exoplanet transits. We must be careful not to remove too many components because we would be 
removing the signals of interest. 



3.2. PCA implementation 

We construct a large reference ensemble of light curves {SF\, . . . , SFpj} of 1000 stars (12.5 < 
Kp < 13.5, CDPP12 < 40 ppm) drawn randomly from the entire FOV. Before performing SVD, 
we remove thermal settling events and trends > 10 days as described in § [2] Since SVD finds the 
eigenvectors of D T D it is susceptible to outliers as is any least squares estimator. We perform a 
robust SVD that relies on iterative outlier rejection following these steps: 

1. Find principle components and weights for light curve ensemble. 

2. The i th light curve is considered an outlier if any of the mode weights (oj,i, . . • , a^) differ 
significantly from the typical mode weight in the ensemble. We consider dij to be significantly 
different from the ensemble if 

Kj : -med(qj)| 
MAD(oj) 

where med(aj) and MAD(aj) are the median value and the median absolute deviation of all 
the dj mode weights. 

3. Remove outlier light curves from the ensemble. 
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4. Repeat until no outliers remain. 

For our 1000-star sample we identified and removed 51 stars from our ensemble. These stars 
tended to have high amplitude intrinsic astrophysical variability, i.e. due to spots and flares. We 
plot the four most significant TERRA principle components in Figure [3] and offer some physical 
interpretations of the mechanisms behind these modes in the following section. 



3.3. Interpretation of Photometric Modes 

In this section, we associate the variability captured in the principle components to changes 
in the state of the Kepler spacecraft that couple to photometry. The three-day cycle isolated in 
our first principle component is due to a well-known, three-day momentum management cycle on 



the spacecraft (Christiansen et al. 2011). To keep a fixed position angle, Kepler must counteract 
external torques by spinning up reaction wheels. These reaction wheels have frictional losses which 
leak a small amount of heat into the spacecraft, which changes the PSF width and shape of the 
stars. 

We can gain a more detailed understanding of this effect, by examining how the mode weights 
for each reference star corresponding to Vi, i.e. {ai,i, . . . , a/v,i}, vary across the FOV. We display 
the RA and Dec positions of our 1000-star sample in Figure [4] and color-code the points with the 
value of dj. The a\ and a,2 mode weights show remarkable spatial correlation across the FOV. That 
a\ is positive in the center of the FOV and negative at the edges of the FOV means the systematic 
photometric errors in these two regions respond to the momentum cycle in an anticorrelated sense. 
The telescope is focused such that the PSF is sharpest at intermediate distances from the center of 



the FOV. Since stars in the center and on the extreme edges have the blurriest PSFs (Van Cleve 



& Caldwell 2009), they respond most strongly to the momentum cycle. 



The mechanism behind the variability seen in V2 is less clear. V2 includes a high frequency 
component with a period of 1.68 hours. The Kepler team has also noticed this periodicity in the 
pixel scale (Douglas Caldwell, private communication, 2012). A possible explanation is thermal 
coupling of the telescope optics to a heater that turns off and on with a ~20 minute period. The 
1.68 hour variability would be an alias of this higher frequency with the observing cadence of 
29.4 minutes. The gradient in a2 across the FOV suggests the heater is coupled to the telescope 
optics in a tip/tilt rather than piston sense. 

The higher-order components 0,3 and 04 do not show significant spatial correlation, which 
suggests that V3 and V4 are not due to changes in the local PSF. Since V3 and V4 have a ~10-day 
timescale, they could be the high frequency component of the differential velocity aberration trend 
that was not removed by our 10-day spline. 
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Fig. 3. — Top: The first four TERRA principle components in our 1000-light curve ensemble 
plotted in order of significance. V\ has a 3-day periodicity and is due to changes in the thermal 
state of the spacecraft caused by a 3-day momentum management cycle. V% has a high frequency 
component (P = 1.68 hours) that could be due to a 20 minute thermal cycle from an onboard 
heater aliased with the 29.4 minute observing cadence alias of a 20- minute thermal cycle driven by 
an onboard heater. 
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Fig. 4. — The RA and Dec positions of our 1000-star ensemble. The points are color-coded by 
a,i, the weights for mode V%. Negative values are shown in blue and positive values are shown in 
red. The fact that the sign and magnitude of a\ depends on distance from the center of the FOV 
supports the idea that the variability captured by V\ is due to PSF breathing of the telescope which 
is driven by the three-day momentum management cycle. The gradient in 02 could be due to the 
thermal coupling of an onboard heater to the optics in a tip/tilt sense. Mode weights 03 and 04 
show no spatial correlation and do not seem to depend on changes in the PSF width. 
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4. Calibrated Photometry 

4.1. Removal of Modes 

After determining which of the NMode principle components correspond to noise modes, we can 
remove them according to Equation [2j In Figure [5j we show fits to KIC-8144222 Q6 photometry 
using different combinations of TERRA principle components. We achieve uniform residuals using 
only 2 of our modes as we show quantitatively below. The simplicity of our model buys some 
insurance against overfitting. 



4.2. Performance 



For each of the residuals in Figure[5| we computed the mean depth 5F(ti) of a putative 12-hour 
transit centered at ti for every cadence in Q6. The distribution of SF due to noise determines the 
minimum transit depth that can be detected by a transit search algorithm. 5F is computed by 

SF(ti) = [8F*g](ti) 
where '*' denotes convolution and g is the following kernel 



l 



.length = Nt 



length = Nj 



length = Nt. 



where Nt is 24. For each of the cotrending schemes, we computed the following statistics describing 
the distribution of 5F: standard deviation (a), 90 percentile (90 %), and 99 percentile (99 %). The 
standard deviation is roughly equivalent to CDPP12. Since transit search algorithms key off on 
peaks in 5F, the percentile statistics are more appropriate figures of merit. We list these statistics 
for KIC-8144222 in Table [l] Ensemble-calibrated photometry produced tighter distributions in 5F 
than the spline baseline. 



4.3. Comparison to PDC 

In this section, we offer some simple comparisons between TERRA and the PDC implemen- 
tation of Twicken et al. (2010). This paper represents our efforts to improve upon that algorithm. 
The Kepler PDC pipeline has evolved beyond that presented in Twicken et al. (2010) culminating 
with PDC-MAP ( jStumpe et "aI]|20T2l |Smith et al^[20T2l ). We feel that the |Twicken et"al"1 pOlO ) 
algorithm is an important touchstone for comparison given that the most recent release of planets 
( Batalha etldTpOlI ) was based on photometry that was largely processed with the iTwicken et al. 
(20101) algorithm. 
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Fig. 5. — Least squares fits using TERRA principle components to KIC-8144222 Q6 photometry. 
The bottom panel shows 12-hour 5F where smaller scatter implies greater sensitivity to small 
transits. We show the spline fit (magenta) as a baseline since it incorporates no ensemble-based 
cotrending information. The 5F using spline detrending shows large spikes at the momentum cycle 
cusps, which are suppressed in the TERRA cotrending. Using our robust modes, we are able to 
produce a clean, calibrated light curve using only two modes. Decreased model complexity helps 
guard against overfitting. 
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Table 1. Comparison of fits to KIC-8144222 photometry. 



Cotrending 


a 


90 % 


99 % 


2 PMs 


24 


28 


53 


4 PMs 


24 


28 


53 


Spline 


53 


66 


146 


Note. 


Standard 


deviation, 


90 percentile, 


and 


99 


percentile 



(in ppm) of the 5F distributions 
for KIC-8144222 using different 
cotrending schemes. The spline 
fit is included as a baseline since 
it incorporates no ensemble-based 
cotrending information. In com- 
puting 5F, we have assumed a 12- 
hour transit duration. All cotrend- 
ing approaches yield tighter 5F dis- 
tributions than the spline baseline. 
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We assess cotrending performance in the context of transit detectability. We note that PDC 
outputs are not directly used in transit detection. PDC light curves are subject to additional 



detrending (mostly of low frequency content) before the transiting planet search is run (Tenenbaum 



et al. 2010). 



In Figure [6j we show fits to the KIC-8144222 photometry using 4 TERRA modes and the 
PDC algorithm. While PDC flattens photometry collected during the thermal transients, it injects 
high frequency noise into regions that are featureless in the TERRA-calibrated photometry. For 
KIC-8144222, the RMS scatter in the 12-hour 5F distribution is 24 ppm for TERRA processed 
photometry and 36 ppm for PDC processed photometry. 

Using 4 TERRA modes, we cotrend 100 stars selected at random from our 1000-star reference 
ensemble. We then compute 3, 6, and 12-hour 5F from TERRA and PDC calibrated light curves. 
We then calculate the difference between the a, 90%, and 99% statistics for TERRA and PDC 
cotrending. We show the distribution of these differences for the 12-hour 5F in Figure [TJ The 
median improvement in a, 90%, and 99% using TERRA cotrending is 2.8, 6.6, and 8.7 ppm. We 
tabulate the median values of the a, 90%, and 99% statistics in Table [2j 

We believe that these comparisons are representative of the stars from which we constructed 
our reference ensemble (12.5 < Kp < 13.5 and CDPP12 < 40 ppm). These bright, low-noise 
stars are the most amenable to exoearth detection. Our comparisons do not pertain to stars with 
different brightness or noise level. 



5. Conclusions 

TERRA is a new technique for using ensemble photometry to self-calibrate instrumental sys- 
tematics in Kepler light curves. We construct a simple noise model by running a high-pass filter 
and removing thermal settling events before computing principle components. For a typical 12.5 < 
Kp < 13.5 and CDPP12 < 40 ppm star, TERRA produces ensemble-calibrated photometry with 
33 ppm RMS scatter in 12 hour bins. With this noise level, a 100 ppm transit from an exoearth 
will be detected at ~ 3a per transit. 

A potential drawback of removing thermal settling events is discarding photometry that con- 
tains a transit. Thermal settling events amounted to 14% of the valid cadences in Q1-Q8 photom- 
etry. Since signal to noise grows as the square root of the number of transits, removing 14% of the 
photometry results in a 7% reduction in the signal to noise of a given transit. The completeness 
of the survey may decrease slightly, since some borderline transits will remain below threshold. 
However, this can easily be overcome by gathering 14% more data. 

Ensemble-based cotrending is most effective when the timescales in the ensemble are matched 
to the signal of interest. We are skeptical that a "one size fits all" approach exists and we encourage 
those who wish to get the most out of Kepler data to tune their cotrending to the timescale of 
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Fig. 6. — Same as Figure [5] except we compare fits using the 4 TERRA modes, with the PDC 
processed photometry. The bottom panel shows 12-hour SF where smaller scatter implies greater 
sensitivity to small transits. The RMS scatter in the 12-hour 5F distribution is 24 ppm for TERRA 
processed photometry and 36 ppm for PDC processed photometry. 
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Table 2. Comparison of TERRA and PDC cotrending performance for 100 stars. 



Transit Width a a 90% 90% 99% 99% 

(hours) TERRA PDC TERRA PDC TERRA PDC 



3 
6 

12 



58 
43 
33 



60 
45 
37 



68 
50 
39 



76 
57 
47 



129 
97 
76 



141 
105 



Note. - A comparison of the SF distributions using TERRA and 
PDC cotrending of 100 stars drawn randomly from our 1000-star sample. 
We have assumed a range of transit widths. We show the median values 
of the standard deviation, 90 percentile, and 99 percentile (in ppm) of 
the 5F distributions. For these 100 stars, TERRA yields tighter distri- 
butions of 5F. The improvement ranges from 8 to 12 ppm in the 99 % 
statistic. 
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Fig. 7. — We computed the standard deviation, 90 percentile, and 99 percentile (in ppm) of 
12-hour 5F for 100 light curves using TERRA and PDC cotrending. The histograms show the 
difference of the TERRA and PDC statistics. Negative values mean a tighter 5F distribution using 
our cotrending and hence a lower noise floor in a transit search. 



their signals of interest. 
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